Ames Housing: 허위매물 탐지 프로젝트
일단잡1조(송성필, 홍주형, 편서영, 양현준)
<<<<<<< HEAD
  • 개요
  • 2장
  • 3장
  • 4장
=======
  • 개요
  • 2장
  • 3장
  • 4장
  • 최종 허위매물은
>>>>>>> 46f553887d1b53b5727b2beebc043eb7ba4323d9
어디 있니 허위매물
잡았다 요놈
  • 에임스(Ames) 도시의 허위매물 탐지
Ames 도시 특징
  • 위치: 미국 아이오와주 중부, 디모인 북쪽 약 50km
  • 인구: 약 66,000명 (2020년 기준)
  • 특징:
  1. 아이오와 주립대학교 소재 (학생 수 약 30,000명, 전체 인구의 약 45%)
  2. 안정된 주거 환경과 활발한 임대 시장 (임대 가구 비율 약 55%)
  3. 젊은 인구 비중 높고 교육 중심 도시 (20~34세 인구 비중 약 40%)
Ames 위치 및 지역 분포
<<<<<<< HEAD
  • Ames 지역의 평균 주택가격 기준
  • 상·중·하로 분류
  • 각 점은 개별 주택을 의미
<<<<<<< HEAD

::: ### 지역별 주택 가격 구간 분류

=======
:::
>>>>>>> 46f553887d1b53b5727b2beebc043eb7ba4323d9
가격 구간 평균 가격 범위 (USD) 주요 지역
🔺 고가 지역 (High) ≥ 217,676 NoRidge, NridgHt, StoneBr, Veenker, Greens, Timber
⚖️ 중간 지역 (Mid) 136,144 ~ 217,675 CollgCr, Mitchel, NAmes, SawyerW, OldTown, Crawfor, Edwards
🔻 저가 지역 (Low) ≤ 136,143 MeadowV, BrDale, IDOTRR, Landmrk, Blueste
:::
  • Neighborhood별 평균 SalePrice를 기준으로 분위수(Quantile)를 계산하여 고가/중간/저가로 분류함
  • 상위 25% 이상: 고가 지역, 중위 50%: 중간 지역, 하위 25% 이하: 저가 지역
<<<<<<< HEAD =======
>>>>>>> 46f553887d1b53b5727b2beebc043eb7ba4323d9

🔺 고가 지역 (High)

  • 평균 주택 가격이 높고 고급 주택 밀집
  • 최신 건축/리모델링, 품질 우수
  • 넓은 면적과 부대시설 완비
  • 고급 단독주택 중심, 조용한 환경
  • 거래량은 적지만 희소성 존재

⚖️ 중간 지역 (Mid)

  • Ames 평균 수준 주택 분포
  • 다양한 주거 형태(단독, 타운하우스 등)
  • 젊은층/대학생 임대 수요 존재
  • 인프라 양호, 가족 선호 지역
  • 거래량 많고 시장 내 활발

🔻 저가 지역 (Low)

  • 평균 주택 가격 낮고 일부 노후
  • 유지 관리 상태 중하 수준
  • 소형 임대 주택 비중 높음
  • 소음, 상업지 인접 등으로 선호도 낮음
  • 거래량 적고 정보 부족

📌 분석 과정

6가지 조건을 바탕으로 점수를 부여하고, 3점 이상에 해당되는 허위매물을 추출한다.
이후 회귀 모델을 통해 허위매물을 추출한 뒤, 공통 허위매물을 추출한다.

🏘 가격 별 그룹 분류
<<<<<<< HEAD
<<<<<<< HEAD
=======
>>>>>>> 46f553887d1b53b5727b2beebc043eb7ba4323d9
Neighborhood Price_Level
0 NridgHt High
1 Timber High
2 Somerst High
3 NoRidge High
4 GrnHill High
5 StoneBr High
6 Veenker High
7 NWAmes Mid
8 Blmngtn Mid
9 Mitchel Mid
10 NAmes Mid
11 CollgCr Mid
12 SawyerW Mid
13 Gilbert Mid
14 Sawyer Mid
15 Crawfor Mid
16 Greens Mid
17 ClearCr Mid
18 NPkVill Mid
19 Blueste Mid
20 Landmrk Mid
21 SWISU Low
22 Edwards Low
23 IDOTRR Low
24 OldTown Low
25 MeadowV Low
26 BrDale Low
27 BrkSide Low
<<<<<<< HEAD
  • Tab 1
  • Tab 2

✔GrLivArea : 지상층 면적
✔YearRemodAdd : 리모델링
✔RoomDensity : 방 밀도 (방수/면적)

<<<<<<< HEAD
✔OverallQual : 지상층 면적
✔OverallCond : 리모델링
✔Amenities : 방 밀도 (방수/면적)
📊 **High 그룹 분석**

조건 플래그:
  • flag_high_qual: 3
  • flag_good_condition: 22
  • flag_high_area: 10
  • flag_high_remod: 76
  • flag_high_density: 93
  • flag_high_amenities: 3

Score 분포:
  • 0: 76
  • 1: 103
  • 2: 47
  • 3: 2
  • 4: 1

Score ≥ 3인 건수: 3건
📊 **Mid 그룹 분석**

조건 플래그:
  • flag_mid_qual: 2
  • flag_good_condition: 366
  • flag_mid_area: 53
  • flag_mid_remod: 107
  • flag_mid_density: 296
  • flag_mid_amenities: 37

Score 분포:
  • 0: 158
  • 1: 348
  • 2: 172
  • 3: 51
  • 4: 4

Score ≥ 3인 건수: 55건
📊 **Low 그룹 분석**

조건 플래그:
  • flag_low_qual: 4
  • flag_good_condition: 30
  • flag_low_area: 24
  • flag_low_remod: 59
  • flag_low_density: 131
  • flag_low_amenities: 10

Score 분포:
  • 0: 133
  • 1: 155
  • 2: 30
  • 3: 13
  • 4: 1

Score ≥ 3인 건수: 14건

❗ 조건 플래그 결과

72건의 허위매물 의심 후보 추출
우측 지도를 통해 그룹별 허위매물 분포 확인 가능함

Neighborhood price_level
OldTown Low
Mitchel Mid
Mitchel Mid
MeadowV Low
OldTown Low
NAmes Mid
NAmes Mid
Sawyer Mid
NAmes Mid
NoRidge High
CollgCr Mid
CollgCr Mid
Mitchel Mid
NAmes Mid
Sawyer Mid
Somerst High
NAmes Mid
Crawfor Mid
NAmes Mid
NAmes Mid
MeadowV Low
NAmes Mid
NAmes Mid
NAmes Mid
Crawfor Mid
NAmes Mid
Sawyer Mid
NAmes Mid
BrkSide Low
NAmes Mid
Timber High
NAmes Mid
Crawfor Mid
OldTown Low
NAmes Mid
SawyerW Mid
Sawyer Mid
BrkSide Low
NAmes Mid
OldTown Low
NAmes Mid
Edwards Low
Timber High
SawyerW Mid
NAmes Mid
NAmes Mid
Sawyer Mid
Sawyer Mid
NAmes Mid
NAmes Mid
Mitchel Mid
NAmes Mid
Mitchel Mid
Veenker High
Edwards Low
CollgCr Mid
NAmes Mid
NAmes Mid
NAmes Mid
Sawyer Mid
CollgCr Mid
NAmes Mid
NAmes Mid
NAmes Mid
NAmes Mid
Sawyer Mid
NAmes Mid
NAmes Mid
NAmes Mid
NAmes Mid
SawyerW Mid
OldTown Low
BrkSide Low
NAmes Mid
Veenker High
Veenker High
Sawyer Mid
NAmes Mid
Sawyer Mid
SawyerW Mid
NWAmes Mid
Sawyer Mid
NAmes Mid
Sawyer Mid
NAmes Mid
Sawyer Mid
NAmes Mid
CollgCr Mid
NAmes Mid
NAmes Mid
Sawyer Mid
Crawfor Mid
Blueste Mid
NAmes Mid
Sawyer Mid
Sawyer Mid
Sawyer Mid
Veenker High
Veenker High
NAmes Mid
Mitchel Mid
NAmes Mid
NAmes Mid
NAmes Mid
Sawyer Mid
Mitchel Mid
Sawyer Mid
NAmes Mid
ClearCr Mid
Mitchel Mid
CollgCr Mid
Gilbert Mid
Mitchel Mid
NAmes Mid
Somerst High
Greens Mid
Edwards Low
SawyerW Mid
CollgCr Mid
Sawyer Mid
NAmes Mid
NAmes Mid
Sawyer Mid
NPkVill Mid
SawyerW Mid
Crawfor Mid
NAmes Mid
CollgCr Mid
NAmes Mid
NAmes Mid
Sawyer Mid
Sawyer Mid
NWAmes Mid
NAmes Mid
NAmes Mid
NAmes Mid
Blueste Mid
Sawyer Mid
Mitchel Mid
NAmes Mid
NAmes Mid
NAmes Mid
NWAmes Mid
NAmes Mid
OldTown Low
NAmes Mid
NAmes Mid
NAmes Mid
Sawyer Mid
NAmes Mid
Sawyer Mid
Sawyer Mid
Mitchel Mid
NAmes Mid
OldTown Low
Sawyer Mid
Blueste Mid
IDOTRR Low
OldTown Low
NAmes Mid
NAmes Mid
Mitchel Mid
NAmes Mid
NAmes Mid
NAmes Mid
Sawyer Mid
NAmes Mid
NWAmes Mid
CollgCr Mid
NAmes Mid
NAmes Mid
NAmes Mid
NAmes Mid
NAmes Mid
NAmes Mid
NAmes Mid
NAmes Mid
CollgCr Mid
NoRidge High
Sawyer Mid
NAmes Mid
NAmes Mid
CollgCr Mid
NAmes Mid
OldTown Low
CollgCr Mid
CollgCr Mid
NAmes Mid
NAmes Mid
Sawyer Mid
Mitchel Mid
NAmes Mid
NWAmes Mid
Timber High
Timber High
NPkVill Mid
NAmes Mid
CollgCr Mid
Sawyer Mid
OldTown Low
NAmes Mid
CollgCr Mid
CollgCr Mid
NAmes Mid
NAmes Mid
Blueste Mid
NAmes Mid
NAmes Mid
Edwards Low
NAmes Mid
NWAmes Mid
NAmes Mid
NWAmes Mid
SawyerW Mid
CollgCr Mid
NAmes Mid
Sawyer Mid
Timber High
NAmes Mid
CollgCr Mid
Sawyer Mid
Crawfor Mid
CollgCr Mid
OldTown Low
NAmes Mid
NAmes Mid
Sawyer Mid
NAmes Mid
NAmes Mid
NAmes Mid
Sawyer Mid
NAmes Mid
Blueste Mid
NPkVill Mid
Crawfor Mid
NAmes Mid
NAmes Mid
Mitchel Mid
NAmes Mid
Somerst High
CollgCr Mid
NAmes Mid
NAmes Mid
NAmes Mid
Crawfor Mid
Blueste Mid
Sawyer Mid
Sawyer Mid
NAmes Mid
Veenker High
NAmes Mid
NAmes Mid
NAmes Mid
NAmes Mid
CollgCr Mid
Sawyer Mid
NWAmes Mid
NAmes Mid
Sawyer Mid
CollgCr Mid
NAmes Mid
CollgCr Mid
NAmes Mid
Mitchel Mid
NPkVill Mid
Mitchel Mid
NAmes Mid
Somerst High
NAmes Mid
Mitchel Mid
SawyerW Mid
Crawfor Mid
SawyerW Mid
NAmes Mid
CollgCr Mid
Sawyer Mid
OldTown Low
NAmes Mid
Veenker High
Edwards Low
Mitchel Mid
NAmes Mid
Sawyer Mid
Sawyer Mid
NAmes Mid
Timber High
Sawyer Mid
Mitchel Mid
NAmes Mid
NWAmes Mid
Crawfor Mid
Sawyer Mid
NAmes Mid
Mitchel Mid
NAmes Mid
Sawyer Mid
OldTown Low
NAmes Mid
NAmes Mid
NAmes Mid
NPkVill Mid
NAmes Mid
OldTown Low
Edwards Low
CollgCr Mid
Blueste Mid
NAmes Mid
SawyerW Mid
Sawyer Mid
NAmes Mid
NPkVill Mid
NAmes Mid
NAmes Mid
CollgCr Mid
BrkSide Low
NAmes Mid
Sawyer Mid
NPkVill Mid
NAmes Mid
Mitchel Mid
NAmes Mid
Mitchel Mid
NPkVill Mid
NAmes Mid
NWAmes Mid
NAmes Mid
Sawyer Mid
NAmes Mid
OldTown Low
NWAmes Mid
NAmes Mid
OldTown Low
IDOTRR Low
NAmes Mid
NAmes Mid
NAmes Mid
Mitchel Mid
Mitchel Mid
NAmes Mid
NAmes Mid
Mitchel Mid
NAmes Mid
NAmes Mid
NAmes Mid
Veenker High
NAmes Mid
NAmes Mid
NAmes Mid
NAmes Mid
NAmes Mid
NAmes Mid
Mitchel Mid
NAmes Mid
NWAmes Mid
Blueste Mid
NWAmes Mid
NAmes Mid
NAmes Mid
Crawfor Mid
Mitchel Mid
Timber High
Edwards Low
Sawyer Mid
Gilbert Mid
Sawyer Mid
NWAmes Mid
Crawfor Mid
Sawyer Mid
Crawfor Mid
Timber High
NAmes Mid
Crawfor Mid
NAmes Mid
Sawyer Mid
CollgCr Mid
CollgCr Mid
NAmes Mid
CollgCr Mid
NPkVill Mid
Sawyer Mid
NridgHt High
NAmes Mid
Mitchel Mid
NAmes Mid
NAmes Mid
Sawyer Mid
CollgCr Mid
Mitchel Mid
Crawfor Mid
Veenker High
NAmes Mid
Sawyer Mid
NAmes Mid
NWAmes Mid
NAmes Mid
NPkVill Mid
IDOTRR Low
NAmes Mid
BrDale Low
NAmes Mid
CollgCr Mid
NAmes Mid
Sawyer Mid
Mitchel Mid
Mitchel Mid
NAmes Mid
NAmes Mid
NAmes Mid
NAmes Mid
NAmes Mid
OldTown Low
NAmes Mid
NAmes Mid
Crawfor Mid
StoneBr High
Mitchel Mid
Sawyer Mid
NAmes Mid
Sawyer Mid
NAmes Mid
NAmes Mid
NAmes Mid
Mitchel Mid
OldTown Low
CollgCr Mid
NAmes Mid
Edwards Low
Crawfor Mid
NAmes Mid
BrkSide Low
NAmes Mid
Timber High
Sawyer Mid
NPkVill Mid
Edwards Low
IDOTRR Low
Sawyer Mid
Sawyer Mid
NWAmes Mid
CollgCr Mid
OldTown Low
NAmes Mid
OldTown Low
NAmes Mid
NAmes Mid
NAmes Mid
Crawfor Mid
<<<<<<< HEAD
<<<<<<< HEAD
=======
>>>>>>> 46f553887d1b53b5727b2beebc043eb7ba4323d9
<<<<<<< HEAD

📌 회귀 분석

❗ 회귀분석 모델을 적용하여 허위매물을 찾아낼 경우, 점수제로 추려진 허위매물과 무엇이 같고, 무엇이 다른지 비교가능

✔ 종속변수: ‘SalePrice’
✔ 독립변수: ‘OverallQual’, ‘OverallCond’, ‘GrLivArea’, ‘YearRemodAdd’, ‘RoomDensity’, ‘amenities’

🔍 회귀분석 진행과정
1. 모든 변수의 영향을 유지하기 위해 Ridge 회귀 적용
2. 데이터를 학습용 80%, 테스트용 20%로 분리하고, 5-fold 교차 검증 수행
3. 교차 검증을 통해 모델의 안정성을 확보하고, 다양한 정규화 강도(α)에서 테스트하여 최적의 예측 성능을 가진 모델을 선택함.
4. 성능 평가 지표로 ‘neg_mean_squared_error’(음의 평균 제곱 오차)를 사용
5. Python의 scikit-learn에서는 점수가 높을수록 좋은 모델로 평가하는 규칙이 있어 오차 지표를 음수화하여 사용
6. 허위매물 판별을 위해 실제가격과 예측가격의 차이(잔차)를 계산하고, 하위 2.8%(72/2579)를 허위매물로 분류
7. 점수제에서 발견한 허위매물 수와 동일한 비율을 적용하여 두 방법론의 결과를 직접 비교할 수 있게 함
8. 각 가격 수준(Low, Mid, High) 그룹별로 별도의 모델을 구축하여 가격대별 특성을 반영한 허위매물 탐지가 가능하도록 함.

=======

회귀분석 모델을 적용하여 허위매물을 찾아낼 경우, 점수제로 추려진 허위매물과 무엇이 같고, 무엇이 다른지 비교가능

  1. 종속변수: ‘SalePrice’
    독립변수: ‘OverallQual’, ‘OverallCond’, ‘GrLivArea’, ‘YearRemodAdd’, ‘RoomDensity’, ‘amenities’
    점수제에서 사용했던 6가지 조건에서 독립변수를 가져옴. 이를 통해 점수제 방식과 비교가 가능함.

  1. 모든 변수의 영향을 유지하기 위해 Ridge 회귀 적용.
    또한 데이터를 학습용 80%, 테스트용 20%로 분리하고, 5-fold 교차 검증 수행.
    교차 검증을 통해 모델의 안정성을 확보하고, 다양한 정규화 강도(α)에서 테스트하여 최적의 예측 성능을 가진 모델을 선택함.
    성능 평가 지표로 ‘neg_mean_squared_error’(음의 평균 제곱 오차)를 사용.
    Python의 scikit-learn에서는 점수가 높을수록 좋은 모델로 평가하는 규칙이 있어 오차 지표를 음수화하여 사용.

  1. 허위매물 판별을 위해 실제가격과 예측가격의 차이(잔차)를 계산하고, 하위 2.8%(72/2579)를 허위매물로 분류.
    이는 점수제에서 발견한 허위매물 수와 동일한 비율을 적용하여 두 방법론의 결과를 직접 비교할 수 있게 함.

  1. 각 가격 수준(Low, Mid, High) 그룹별로 별도의 모델을 구축하여 가격대별 특성을 반영한 허위매물 탐지가 가능하도록 함.
>>>>>>> 46f553887d1b53b5727b2beebc043eb7ba4323d9
  • Low 그룹
  • Mid 그룹
  • High 그룹
<<<<<<< HEAD
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
       1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
       2.78255940e+00, 1.00000000e+01]),
        cv=5, scoring='neg_mean_squared_error')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
=======
#sk-container-id-19 a.estimator_doc_link.fitted:hover {
  /* fitted */
  background-color: var(--sklearn-color-fitted-level-3);
}
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
       1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
       2.78255940e+00, 1.00000000e+01]),
        cv=5, scoring='neg_mean_squared_error')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
>>>>>>> 46f553887d1b53b5727b2beebc043eb7ba4323d9
       1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
       2.78255940e+00, 1.00000000e+01]),
        cv=5, scoring='neg_mean_squared_error')
<<<<<<< HEAD
<<<<<<< HEAD

✔ 설명력 (R²): 0.633 ✔ 최적 α (alpha): 10

✔ 전체 샘플 수 : 662개 ✔ 허위 매물 수 : 19개

=======
Low 그룹 허위매물 정리
▶ 전체 샘플 수: 662개
▶ 허위매물 수: 19개 (2.9%)
>>>>>>> 46f553887d1b53b5727b2beebc043eb7ba4323d9 허위매물 목록 (정렬 기준 : residual)
<<<<<<< HEAD
=======
>>>>>>> 46f553887d1b53b5727b2beebc043eb7ba4323d9
Neighborhood SalePrice predicted residual
309 Edwards 184750 341478.416904 -156728.416904
2204 OldTown 90000 165060.186265 -75060.186265
469 OldTown 122000 196637.789395 -74637.789395
1909 OldTown 97500 165711.760154 -68211.760154
740 IDOTRR 40000 100640.665189 -60640.665189
116 OldTown 159500 219067.388226 -59567.388226
1214 OldTown 107500 165796.538413 -58296.538413
254 OldTown 133900 187704.330474 -53804.330474
1436 OldTown 106000 158650.035771 -52650.035771
677 OldTown 103500 155031.742968 -51531.742968
374 BrkSide 106900 158205.245880 -51305.245880
1225 OldTown 117000 165247.245319 -48247.245319
2277 IDOTRR 123000 171019.817386 -48019.817386
2025 OldTown 117500 163763.677821 -46263.677821
205 IDOTRR 50000 95575.542686 -45575.542686
528 IDOTRR 89500 130977.645762 -41477.645762
1064 OldTown 64500 105347.680388 -40847.680388
427 OldTown 12789 53436.186714 -40647.186714
22 MeadowV 98000 137418.527580 -39418.527580
<<<<<<< HEAD
=======
>>>>>>> 46f553887d1b53b5727b2beebc043eb7ba4323d9 Low 그룹 허위매물 위치 지도
<<<<<<< HEAD
<<<<<<< HEAD
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
       1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
       2.78255940e+00, 1.00000000e+01]),
        cv=5, scoring='neg_mean_squared_error')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
=======
#sk-container-id-20 a.estimator_doc_link.fitted:hover {
  /* fitted */
  background-color: var(--sklearn-color-fitted-level-3);
}
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
       1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
       2.78255940e+00, 1.00000000e+01]),
        cv=5, scoring='neg_mean_squared_error')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
>>>>>>> 46f553887d1b53b5727b2beebc043eb7ba4323d9
       1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
       2.78255940e+00, 1.00000000e+01]),
        cv=5, scoring='neg_mean_squared_error')
설명력 (R²): 0.746
최적 α (alpha): 0.0001
<<<<<<< HEAD
Mid 그룹 허위매물 정리
<<<<<<< HEAD
=======
>>>>>>> 46f553887d1b53b5727b2beebc043eb7ba4323d9
▶ 전체 샘플 수: 1464개
▶ 허위매물 수: 41개 (2.8%)
허위매물 목록 (정렬 기준 : residual)
<<<<<<< HEAD
=======
>>>>>>> 46f553887d1b53b5727b2beebc043eb7ba4323d9
Neighborhood SalePrice predicted residual
180 NWAmes 82500 186512.846249 -104012.846249
997 NAmes 84900 164088.769037 -79188.769037
1262 Sawyer 112000 189448.582663 -77448.582663
1703 Gilbert 164000 237690.994626 -73690.994626
607 SawyerW 131000 203319.538648 -72319.538648
232 NAmes 97500 165959.754915 -68459.754915
748 NWAmes 154000 222452.575310 -68452.575310
777 NAmes 140000 207071.732788 -67071.732788
2207 Sawyer 158000 222634.022503 -64634.022503
1735 NAmes 180000 239892.023341 -59892.023341
1777 Sawyer 130500 189388.251889 -58888.251889
328 NAmes 100000 157610.279322 -57610.279322
1592 NAmes 152500 209835.479856 -57335.479856
1973 NAmes 110000 167110.808559 -57110.808559
2399 Crawfor 135000 191693.656661 -56693.656661
2478 Crawfor 149000 205373.529713 -56373.529713
1392 Mitchel 115000 170705.874780 -55705.874780
379 NAmes 104900 160448.835040 -55548.835040
2165 Crawfor 137000 192003.206946 -55003.206946
1085 NAmes 132000 184829.510696 -52829.510696
1790 NAmes 139000 191731.983243 -52731.983243
445 Sawyer 112000 164354.269125 -52354.269125
289 NAmes 167000 219307.354724 -52307.354724
2293 ClearCr 148400 200423.025348 -52023.025348
79 SawyerW 67500 119207.948646 -51707.948646
2044 NAmes 133000 184566.013933 -51566.013933
1259 NWAmes 170000 219375.681066 -49375.681066
1533 Sawyer 62383 111275.071760 -48892.071760
1955 Crawfor 191000 239871.517587 -48871.517587
1557 SawyerW 138500 186704.367535 -48204.367535
478 Sawyer 119500 166866.155117 -47366.155117
2113 Crawfor 145000 190775.062653 -45775.062653
657 NPkVill 123000 168657.234940 -45657.234940
2085 Gilbert 115000 159734.356699 -44734.356699
112 NWAmes 185000 229329.132298 -44329.132298
1276 NAmes 143000 187007.097643 -44007.097643
2397 NAmes 242000 285697.477460 -43697.477460
661 NAmes 135000 178659.552930 -43659.552930
585 Mitchel 160000 203299.251232 -43299.251232
793 Sawyer 121500 164707.980590 -43207.980590
1752 Blueste 121000 164024.916824 -43024.916824
Mid 그룹 허위매물 위치 지도
<<<<<<< HEAD
<<<<<<< HEAD
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
       1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
       2.78255940e+00, 1.00000000e+01]),
        cv=5, scoring='neg_mean_squared_error')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
=======
#sk-container-id-21 a.estimator_doc_link.fitted:hover {
  /* fitted */
  background-color: var(--sklearn-color-fitted-level-3);
}
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
       1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
       2.78255940e+00, 1.00000000e+01]),
        cv=5, scoring='neg_mean_squared_error')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
>>>>>>> 46f553887d1b53b5727b2beebc043eb7ba4323d9
       1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
       2.78255940e+00, 1.00000000e+01]),
        cv=5, scoring='neg_mean_squared_error')
설명력 (R²): 0.730
최적 α (alpha): 0.0001
<<<<<<< HEAD
High 그룹 허위매물 정리
<<<<<<< HEAD
=======
>>>>>>> 46f553887d1b53b5727b2beebc043eb7ba4323d9
▶ 전체 샘플 수: 453개
▶ 허위매물 수: 13개 (2.9%)
허위매물 목록 (정렬 기준 : residual)
<<<<<<< HEAD
=======
>>>>>>> 46f553887d1b53b5727b2beebc043eb7ba4323d9
Neighborhood SalePrice predicted residual
275 Veenker 150000 377583.040255 -227583.040255
1008 Timber 204000 331268.591361 -127268.591361
1686 NoRidge 285000 383469.594440 -98469.594440
111 Somerst 172500 267553.768343 -95053.768343
1310 StoneBr 270000 357997.110005 -87997.110005
300 Somerst 280750 363476.267632 -82726.267632
1398 Timber 202900 281247.127845 -78347.127845
1278 Somerst 170000 248322.760527 -78322.760527
1495 NoRidge 248000 324732.678440 -76732.678440
949 NoRidge 290000 364891.692276 -74891.692276
51 Somerst 193800 268322.417291 -74522.417291
1411 StoneBr 130000 204137.130175 -74137.130175
2134 Somerst 345000 416042.158581 -71042.158581
High 그룹 허위매물 위치 지도
<<<<<<< HEAD
<<<<<<< HEAD
=======
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
       1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
       2.78255940e+00, 1.00000000e+01]),
        cv=5, scoring='neg_mean_squared_error')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
       1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
       2.78255940e+00, 1.00000000e+01]),
        cv=5, scoring='neg_mean_squared_error')
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
       1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
       2.78255940e+00, 1.00000000e+01]),
        cv=5, scoring='neg_mean_squared_error')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
       1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
       2.78255940e+00, 1.00000000e+01]),
        cv=5, scoring='neg_mean_squared_error')
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
       1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
       2.78255940e+00, 1.00000000e+01]),
        cv=5, scoring='neg_mean_squared_error')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
       1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
       2.78255940e+00, 1.00000000e+01]),
        cv=5, scoring='neg_mean_squared_error')
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
       1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
       2.78255940e+00, 1.00000000e+01]),
        cv=5, scoring='neg_mean_squared_error')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
       1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
       2.78255940e+00, 1.00000000e+01]),
        cv=5, scoring='neg_mean_squared_error')
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
       1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
       2.78255940e+00, 1.00000000e+01]),
        cv=5, scoring='neg_mean_squared_error')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
       1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
       2.78255940e+00, 1.00000000e+01]),
        cv=5, scoring='neg_mean_squared_error')
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
       1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
       2.78255940e+00, 1.00000000e+01]),
        cv=5, scoring='neg_mean_squared_error')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RidgeCV(alphas=array([1.00000000e-04, 3.59381366e-04, 1.29154967e-03, 4.64158883e-03,
       1.66810054e-02, 5.99484250e-02, 2.15443469e-01, 7.74263683e-01,
       2.78255940e+00, 1.00000000e+01]),
        cv=5, scoring='neg_mean_squared_error')
>>>>>>> 46f553887d1b53b5727b2beebc043eb7ba4323d9
<<<<<<< HEAD =======

점수제 허위매물

72개

회귀분석 허위매물

73개

최종 선정 허위매물

8개

  • 점수제
  • 회귀분석
  • 공통허위매물
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook
Neighborhood PID SalePrice score OverallQual OverallCond GrLivArea YearRemodAdd RoomDensity amenities
1752 Blueste 909451140 121000 3 6 6 1229 1980 0.005696 2
374 BrkSide 903225160 106900 4 6 9 1290 2000 0.006202 3
2165 Crawfor 909254010 137000 3 7 8 1228 1990 0.005700 2
2399 Crawfor 909254100 135000 3 6 8 1461 1991 0.004791 2
2478 Crawfor 909275050 149000 3 7 6 1502 2000 0.005326 2
2113 Crawfor 909275020 145000 3 6 6 1958 1950 0.005107 2
2477 Edwards 909101010 110000 4 6 8 1196 2000 0.006689 3
2085 Gilbert 527226020 115000 3 6 2 1474 1952 0.005427 3
585 Mitchel 923400040 160000 4 6 7 1750 1985 0.005143 3
997 NAmes 534427010 84900 3 5 6 1728 2001 0.006944 1
2044 NAmes 534479130 133000 3 6 7 1578 1950 0.003802 2
777 NAmes 534477110 140000 3 6 8 1668 2005 0.004796 2
1085 NAmes 535450210 132000 3 6 8 1224 2004 0.004902 2
1592 NAmes 535353240 152500 3 7 7 1527 1999 0.005239 2
1276 NAmes 535450310 143000 3 6 6 1846 1950 0.004875 2
1790 NAmes 535175030 139000 3 6 6 1632 1988 0.004902 2
748 NWAmes 527352150 154000 3 7 6 2050 1978 0.005366 2
1225 OldTown 903430090 117000 3 6 8 1635 2003 0.003670 2
1909 OldTown 903476090 97500 3 7 5 1864 2000 0.006438 1
2207 Sawyer 905225020 158000 3 5 4 2654 1996 0.005275 3
478 Sawyer 532351150 119500 3 6 6 1654 1977 0.007255 1
793 Sawyer 905103060 121500 3 5 6 1392 1996 0.005029 2
1777 Sawyer 533352170 130500 3 6 8 1479 2005 0.006085 2
445 Sawyer 905226050 112000 3 5 7 1416 2007 0.006356 2
607 SawyerW 906226060 131000 3 5 7 2016 2007 0.003968 1
1557 SawyerW 906425045 138500 3 6 8 1445 1993 0.005536 2
1008 Timber 916403200 204000 4 6 8 2237 2006 0.004470 3
275 Veenker 533350090 150000 3 9 3 2944 1977 0.004416 2

📍 서로 다른 탐지 관점을 가지고 있기 때문에 두가지 방법의 결과가 상이하다고 판단

🏠 최종 결론

점수제를 통한 허위매물 탐지는 직관적인 기준에 기반해 빠르게 의심 매물을 걸러낼 수 있으며,
회귀 분석을 통한 방법은 패턴분석을 통해 정교한 판단을 할 수 있습니다.

두 가지 방법을 보완적으로 함께 활용할 경우,
단일 방법보다 더 높은 신뢰도로 허위매물 가능성이 높은 대상을 선별할 수 있다고 판단됩니다.

>>>>>>> 46f553887d1b53b5727b2beebc043eb7ba4323d9
## 에임즈 인트로
  • 블라블라
연속형 > 박스플롯 / 범주형 > 그래프, 조건별 결과
회귀모델
결론 및 보완점
  1. 결론
  2. 분석의 제한점